Partitioned Parallel Radix Sort

نویسندگان

  • Shin-Jae Lee
  • Minsoo Jeon
  • Andrew Sohn
  • Dongseung Kim
چکیده

Load balanced parallel radix sort solved the load imbalance problem present in parallel radix sort. By redistributing the keys in each round of radix, each processor has exactly the same number of keys, thereby reducing the overall sorting time. Load balanced radix sort is currently known as the fastest internal sorting method for distributed-memory multiprocessors. However, as the computation time is balanced, the communication time emerges as the bottleneck of the overall sorting performance due to key redistribution. We present in this report a new parallel radix sorter that solves the communication problem of balanced radix sort, called partitioned parallel radix sort. The new method reduces the communication time by eliminating the redistribution steps. The keys are first sorted in a top-down fashion (left-to-right as opposed to right-to-left) by using some most significant bits. Once the keys are localized to each processor, the rest of sorting is confined within each processor, hence eliminating the need for global redistribution of keys. It enables well balanced communication and computation across processors. The proposed method has been implemented in three different distributed-memory platforms, including IBM SP2, Cray T3E, and PC Cluster. Experimental results with various key distributions indicate that partitioned parallel radix sort indeed shows significant improvements over balanced radix sort. IBM SP2 shows 13% to 30% improvement while Cray/SGI T3E does 20% to 100% in execution time. PC cluster shows over 2.4-fold improvement in execution time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Local Sort on Parallel Sorting Algorithms

We show the importance of sequential sorting in the context of in memory parallel sorting of large data sets of 64 bit keys. First, we analyze several sequential strategies like Straight Insertion, Quick sort, Radix sort and CC-Radix sort. As a consequence of the analysis, we propose a new algorithm that we call Sequential Counting Split Radix sort, SCS-Radix sort. SCS-Radix sort is a combinati...

متن کامل

Conscious Radix Sort

The exploitation of data locality in parallel computers is paramount to reduce the memory traac and communication among processing nodes. We focus on the exploitation of locality by Parallel Radix sort. The original Parallel Radix sort has several communication steps in which one sorting key may have to visit several processing nodes. In response to this, we propose a reorganization of Radix so...

متن کامل

A full parallel radix sorting algorithm for multicore processors

The problem addressed in this paper is that we want to sort an integer array a [] of length n on a multi core machine with k cores. Amdahl’s law tells us that the inherent sequential part of any algorithm will in the end dominate and limit the speedup we get from parallelisation of that algorithm. This paper introduces PARL, a parallel left radix sorting algorithm for use on ordinary shared mem...

متن کامل

Keys Per Processor ( n / p ) Radix SortBitonic Sort Sample Sort Simple Radix Sort

We have developed a methodology for predicting the performance of parallel algorithms on real parallel machines. The methodology consists of two steps. First, we characterize a machine by enumerating the primitive operations that it is capable of performing along with the cost of each operation. Next, we analyze an algorithm by making a precise count of the number of times the algorithm perform...

متن کامل

PARADIS: An Efficient Parallel Algorithm for In-place Radix Sort

In-place radix sort is a popular distribution-based sorting algorithm for short numeric or string keys due to its linear run-time and constant memory complexity. However, efficient parallelization of in-place radix sort is very challenging for two reasons. First, the initial phase of permuting elements into buckets suffers read-write dependency inherent in its in-place nature. Secondly, load ba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Parallel Distrib. Comput.

دوره 62  شماره 

صفحات  -

تاریخ انتشار 2000